
R Workshop
Fun Fact
- If you’re someone who likes to use
to run data analysis, you’re using already “using”
Downloading R

R Workshop
Fun Fact
- If you’re someone who likes to use
to run data analysis, you’re using already “using”
Downloading R
You’ll want to download R first before downloading any additional software
You can download the newest version of R (4.2) here
Downloading RStudio

Downloading RStudio
RStudio is the last software program you’ll need to get started
You can download the newest version of RStudio (Dec 2022) here
R “Quirks”
=======R “Quirks”
>>>>>>> 22b2afc (Commit)- R is case sensitive so what your spelling and the case you use
- Case =/= case <<<<<<< HEAD
=======
- R hates spaces for variable. It will not run with a space
- variable_1 is a GOOD variable name
- variable 1 is a BAD variable name <<<<<<< HEAD
Downloading Materials For Day 1
You can download all the materials for Day 1 of this workshop here * You want the correlation.qmd, data_clean.qmd, ttest.qmd, and regression.qmd files
Installing and Loading Packages
# To Install a Package
install.packages("tidyverse")
# To Load a Package
library(tidyverse)Downloading Materials For Day 1
You can download all the materials for Day 1 of this workshop here * You want the correlation.qmd, data_clean.qmd, ttest.qmd, and regression.qmd files
Installing and Loading Packages
You only have to install a package once (one exception)
You must load a library every time you open an R file (.R, .qmd, .rmd, etc) or restart R/RStudio
Important
A new install of R will remove all installed packages. You must either re-install the packages or save them prior to a new R installation. I’ll cover how to save them at a later date
Importing Data
Importing Data
R works best with csv files (smaller in size) but it will take .sav files (SPSS) and other file formats as well (e.g., .tsv)
Oh and obviously it can read Microsoft Excel files
Variable Types
=======Variable Types
>>>>>>> 22b2afc (Commit)- Numerical
- A positive or negative number between (- \(\infty\), \(\infty\)) <<<<<<< HEAD
- Integer
- A positive or negative whole number between (- \(\infty\), \(\infty\))
- Factor
- A grouping category
- Character
- A text string
- Logical
- A
TRUEorFALSEvalue (e.g., Is X > 1?)
- A
- Date
- Exactly what you think it is
{dplyr} R package
Uses for the dplyr package
=======
- A positive or negative whole number between (- \(\infty\), \(\infty\))
- A grouping category
- A text string
- A
TRUEorFALSEvalue (e.g., Is X > 1?)
- Exactly what you think it is
{dplyr} R package
Uses for the dplyr package
>>>>>>> 22b2afc (Commit)
- Primary use is to transform and manipulate data in a data set
- Calculate means, log transform, compute basic summary statistics <<<<<<< HEAD
- Anyone here who has maybe used database data will see it mimics SQL programming
For anyone who might work with database data, you can pull data from external databases with R and RStudio. We won’t cover that in this workshop but you can do it
Live-ish Coding
- Open the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit
{stringr} R package
Uses for the stringr package
=======
Note
For anyone who might work with database data, you can pull data from external databases with R and RStudio. We won’t cover that in this workshop but you can do it
Live-ish Coding
- Open the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit
{stringr} R package
Uses for the stringr package
>>>>>>> 22b2afc (Commit)
- Primarily used for dealing with character or string data
- Useful for free response questions
- Essentially it’s the
dplyrpackage for string variable types
<<<<<<< HEAD
Live-ish Coding
- Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit
{lubridate} R package
Uses for the lubridate package
=======
Live-ish Coding
- Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit
{lubridate} R package
Uses for the lubridate package
>>>>>>> 22b2afc (Commit)
- Primarily used for dealing with dates
- Provides handy function for converting date formats into other date formats
- E.g., (MM-DD-YY to DD-MM-YY or Month Date, Year) <<<<<<< HEAD
It won’t auto convert your dates to weird incorrect formats like certain spreadsheet programs might.
Live-ish Coding
- Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit
{ggplot2} R package
Uses for the ggplot2 package
=======
Tip
It won’t auto convert your dates to weird incorrect formats like certain spreadsheet programs might.
Live-ish Coding
- Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit
{ggplot2} R package
Uses for the ggplot2 package
>>>>>>> 22b2afc (Commit)
- This may be the most popular package download in R and it’s probably not close
- This is THE visualization package in R. If you can THINK of a graphic, this package can create it
- E.g. box plots, box and whisker plots, violin plots, bar graphs, etc <<<<<<< HEAD
- If you’re REALLY good, you can do this (credit @ralitza_s)
“Live”ish Coding
- Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit
Exporting Data
“Live”ish Coding
- Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit
Exporting Data
Sometimes you want or need to export data you’ve cleaned to another program. Maybe you want to use a program like JASP
Or you’re not comfortable using R for analyses yet so you want to use SPSS
Or maybe others on your team use a different program
R can export to Excel, SPSS, SAS, and CSV files
library(openxlsx)
write.xlsx(df,file = "filename.xlsx")write.csv(df, file = "filename.csv")Lunch Break
- Back at 2pm
Putting It All Together From Start To Finish
Statistical Analyses in R
Correlation
Assumptions (Fields et al, 2012)
- On at least an interval scale
- Normality of Residuals
Live-ish Coding
- Please see correlation.qmd file provided
T Tests
Assumptions (Fields et al, 2012)
=======Lunch Break
- Back at 2pm
Putting It All Together From Start To Finish
Statistical Analyses in R
Correlation
Assumptions (Fields et al, 2012)
- On at least an interval scale
- Normality of Residuals
Live-ish Coding
- Please see correlation.qmd file provided
T Tests
Assumptions (Fields et al, 2012)
>>>>>>> 22b2afc (Commit)- Normality of Residuals
- Independent Observations
- Homogeneity of Variance <<<<<<< HEAD
Live-ish Coding
- Please see the ttest.qmd file provided
Regression
Assumptions (Fields et al, 2012)
=======Live-ish Coding
- Please see the ttest.qmd file provided
Regression
Assumptions (Fields et al, 2012)
>>>>>>> 22b2afc (Commit)- Outliers and Influential Cases
- Normality of Residuals
- Independent Observations
- Homogeneity of Variance
While important, outliers and influential cases rarely influence results with a sufficient sample size. Also difficult to say what “is” and “isn’t” an outlier. Outlier shouldn’t always mean removal
Live-ish Coding
- Please open the regression.qmd file provided
End of Day 1
Downloading Material For Day 2
You can download all the materials for Day 2 of this workshop here * You want the anova.qmd, nonparametric.qmd, intro_qarto.qmd, mlm.qmd, sem.qmd and factor_analysis.qmd
ANOVA: Including Repeated Measures & Factorial
Assumptions (Fields et al, 2012)
=======Important
While important, outliers and influential cases rarely influence results with a sufficient sample size. Also difficult to say what “is” and “isn’t” an outlier. Outlier shouldn’t always mean removal
Live-ish Coding
- Please open the regression.qmd file provided
End of Day 1
Downloading Material For Day 2
You can download all the materials for Day 2 of this workshop here * You want the anova.qmd, nonparametric.qmd, intro_qarto.qmd, mlm.qmd, sem.qmd and factor_analysis.qmd
ANOVA: Including Repeated Measures & Factorial
Assumptions (Fields et al, 2012)
>>>>>>> 22b2afc (Commit)- Normality Within Groups
- Homogeneity of Variance
- Independent Observations <<<<<<< HEAD
Live-ish Coding
- Please open the anova.qmd file provided
Non-Parametric Tests
=======Live-ish Coding
- Please open the anova.qmd file provided
Non-Parametric Tests
>>>>>>> 22b2afc (Commit)- Wilcoxon Ranked-Sum Test (i.e., Mann–Whitney Test)
- Non-parametric equivalent of the independent samples t-test <<<<<<< HEAD
- Wilcoxon Signed-Rank Test
- Non-parametric equivalent of the dependent sample t-test
- Kruskal–Wallis Test
- Non-parametric equivalent of an ANOVA
- Friedman’s Test
- Non-parametric equivalent of a repeated measures ANOVA
Live-ish Coding
- Please open the nonparametric.qmd file
EFA & CFA
EFA Assumptions (Fields et al, 2012)
- Sufficient Sample Size
- Normality of Items
- Correlation Between Items1
- Appropriate Determinant (Det \(>\) 1 x 10-5)
- Non-parametric equivalent of the dependent sample t-test
- Non-parametric equivalent of an ANOVA
- Non-parametric equivalent of a repeated measures ANOVA
Live-ish Coding
- Please open the nonparametric.qmd file
EFA & CFA
EFA Assumptions (Fields et al, 2012)
- Sufficient Sample Size
- Normality of Items
- Correlation Between Items1
- Appropriate Determinant (Det \(>\) 1 x 10-5)
Important
- We want variables to correlate however we do not want them to correlate either
too low (r \(<\) .30) or too high (r \(>\) .80) across multiple items
CFA Assumptions
- Multivariate Normality
Live-ish Coding
- Please open the factor_analysis.qmd file
Lunch Break
- Back at 2pm
SEM
Assumptions To Test (Kaplan, 2001, p. 15218)
=======CFA Assumptions
- Multivariate Normality
Live-ish Coding
- Please open the factor_analysis.qmd file
Lunch Break
- Back at 2pm
SEM
Assumptions To Test (Kaplan, 2001, p. 15218)
>>>>>>> 22b2afc (Commit)- Multivariate Normality
- No Systematic Missing Data
- Sufficiently Large Sample Size
- Correct Model Specification <<<<<<< HEAD
Live-ish Coding
- Please open the sem.qmd file
MLM (Fields et al, 2012)
Assumptions To Test
- Outliers and Influential Cases
- Normality of Residuals
- Independent Observations1
- Homogeneity of Variance
Live-ish Coding
- Please open the sem.qmd file
MLM (Fields et al, 2012)
Assumptions To Test
- Outliers and Influential Cases
- Normality of Residuals
- Independent Observations1
- Homogeneity of Variance
A Note On Independence
- This assumption is not necessarily a concern given that MLM assumes observations are nested (Fields et al, 2012)
Live-ish Coding
- Please open the mlm.qmd file
Quarto: Code + Text
The Holy Grail of Reproducibility
- What if I told you that it was possible to generate 95% of what you need for a manuscript within RStudio AND you could integrate your analyses as well?
- What if I also said you could export this to Microsoft Word?
Let’s Talk About Quarto
- Please open the intro_quarto.qmd file
Final Thoughts
Live-ish Coding
- Please open the mlm.qmd file
Quarto: Code + Text
The Holy Grail of Reproducibility
- What if I told you that it was possible to generate 95% of what you need for a manuscript within RStudio AND you could integrate your analyses as well?
- What if I also said you could export this to Microsoft Word?
Let’s Talk About Quarto
- Please open the intro_quarto.qmd file
Final Thoughts
Firstly, thank you for your time this weekend and I hope you’ve learned something
Second, this is A LOT. I crammed stuffed about 2 years of statistical analyses time and practice into like 2 days. It’s okay and normal if you’re swimming. I’m here if anyone has any questions after or even if they’re using R and trying to do an actual analysis in R. People ask me for help all the time. I’m happy to help
Finally, one last “minor” detail. Some of you already know this but lets just check out the following link
<<<<<<< HEAD